Dog Bites in New York Dataset Exercise

Tijana Blagojev R-Ladies Belgrade

Aim of the Exercise

  • We will get acquainted with how R is functioning

  • We will learn about different types of variables

  • We will just scratch a surface of several R packages like parts of tidyverse (dplyr and ggplot)

  • We will create a dashboard with information contained in dog bites dataset

First steps

  • After installing R and R studio you need to set a working directory where all your work will be stored.

  • The best way to do this is to choose File/New Project which will automatically store all your information in same place.

Exercise

  • Go to this link to download folder with this excercise

  • Open DogsofNewYork.Rproj file

  • Find the presentation here.

R Interface

Packages and Libraries

When you install R, you have basic functions already available within Base R. You can take a look at Introduction to Base R for additional information.

However, in order to access functions or data written by other people there are numerious R packages available.

An R package is a bundle of functions (code), data, documentation, vignettes (examples).

Important note - R is case-sensitive so make sure to check spelling and capitalization!

Packages and Libraries-Code

To access information in R packages they first need to be installed and then accessed through their libraries. Use the following code to install packages and load libraries.

Simple use of R

Type in your console the following command and press enter.

## [1] 4

You use <- to create objects in R. It is called an assignement operator.

## [1] 15

Dataset

The data set on dog bites is taken from R package nycdogs by Kieran Healy. For our exercise it is adapted only to include year 2017 and several variables. So let us see how the dataset looks like.

Importing a dataset

First, we will import and inspect a csv file about dog bites in New York City for 2017 with the following code.

Inspecting dataset

There are 3072 rows that we will refer to observations and 6 columns that we will call variables. As you may also see, we have different types of variables such as character, date, double (continuous).

## Observations: 3,072
## Variables: 6
## $ date_of_bite <date> 2017-01-02, 2017-01-02, 2017-01-04, 2017-01-07, 2017-01…
## $ breed        <chr> "Labrador Retriever Crossbreed", "Lhasa Apso", "Pit Bull…
## $ gender       <chr> "Male", "Male", "Unknown", "Unknown", "Male", "Unknown",…
## $ spay_neuter  <chr> "No", "Yes", "No", "No", "Yes", "No", "No", "No", "No", …
## $ borough      <chr> "Brooklyn", "Brooklyn", "Brooklyn", "Brooklyn", "Brookly…
## $ zip_code     <dbl> 11231, 11211, 11219, 11216, 11216, 11229, 11216, 11206, …

Variables

Numeric and Categorical

Numeric can be:

  • Integer: Age, number of kittens

  • Double (Continuous): Height, weight

Categorical:

  • Character: Black, yellow, white

  • Factor (Ordinal): Cold, mild, warm, hot

Creating R markdown dashboard presentation

In top left corner press a document with the plus sign icon and choose R Markdown. Then open Flex Dashboard template.

Flexdashboard Template

Setting up the Appearance of Flexdashboard

Pipe operator

In tidyverse package there is a so-called “pipe” operator %>%. It passes the result of the left hand-side as the first operator argument of the function on the right handside. It is used to connect multiple operations on data together.

Setup part of the R-markdown-Dashboard Code

In the Setup part code, we will import a dog bites data set and create a subset for number of bites per boroughs that we will use in textual part of our dashboard.

Number of Bites per Borough in New York

Now let us take a look at the 5 boroughs with the highest number of bites

## # A tibble: 5 x 3
##   borough           n  perc
##   <chr>         <int> <dbl>
## 1 Queens          817    27
## 2 Brooklyn        690    22
## 3 Manhattan       663    22
## 4 Bronx           506    16
## 5 Staten Island   284     9

Textual part of the dashboard

We will use tick `, followed by r and some function and closed with another tick as a formula that will automatically add information in the text, so if we use a subset for another year it will update the data in the text straight away. To access particular value in a dataset you can use the following code where the first number is the number of row and the second one the number of column.

## # A tibble: 1 x 1
##   borough
##   <chr>  
## 1 Queens

Textual part of the dashboard-Code

Textual part of the dashboard result

Congratulations you just coded and knitted your first dashboard!!!

Creating a Searchable Datatable

First, in a Setup part of our dashboard document we will create a table without last column related to zip codes.

Now we will add a searchable table in second row of the first column designated with ### with the help of DT package.

Dasboard progress

Creating a Bar Chart

First, we will create a subset to see which are the three top breed bitters. We will again put this part of code in the first Setup part of our R dashboard/R markdown file. We will also change breed variable from character into factor.

Using ggplot and plotly

We will use two packages, one (ggplot) to make a bar graph and another one (plotly) to make the graph’s information pop up when hovering. Ggplot is a package created by Hadley Wickam that is based on a grammar of graphics.

Grammar of Graphics

Enables you to specify building blocks of a plot and to combine them to create graphical display you want.

  • data

  • aesthetic mapping

  • geometric object

  • statistical transformations

  • scales

  • coordinate system

  • position adjustments

  • faceting

Creating bar graph

Instead of Chart B we will write: Three breeds with highest number of bites in 2017 and use this a code for a bar chart.

Bar Chart

Dasboard Progress

Final Stage - Braaavoo!!!!

Stacked Bar of Spayed/Neutered Dogs

In this final part, we will create a stacked bar chart which will show how many dogs that bit were spayed/neutered and how many of them were male or female. So we will again in Setup part create a subset grouped by spay/neuter and gender. We will also create another column to use as pop-up label.

The datadogsgenderspay subset

spay_neuter gender n Info
No Female 271 <br> Spay/Neuter: No <br> Number of bites: 271 <br> Gender: Female <br>
No Male 682 <br> Spay/Neuter: No <br> Number of bites: 682 <br> Gender: Male <br>
No Unknown 1063 <br> Spay/Neuter: No <br> Number of bites: 1063 <br> Gender: Unknown <br>
Yes Female 290 <br> Spay/Neuter: Yes <br> Number of bites: 290 <br> Gender: Female <br>
Yes Male 755 <br> Spay/Neuter: Yes <br> Number of bites: 755 <br> Gender: Male <br>
Yes Unknown 11 <br> Spay/Neuter: Yes <br> Number of bites: 11 <br> Gender: Unknown <br>

Creating stacked bar graph

Instead of Chart C we will write and center title: Bites based on dog’s gender and whether they were spayed/neutered {align=center} and use this a code for a stacked bar:

Stacked Bar

Dashboard Completed

Great Work and Thank you!